Skip to content

Conversation

skc7
Copy link
Contributor

@skc7 skc7 commented Jun 24, 2025

Add runtime Assign_omp call which does memcpy of ptrs on device using "omp_target_memcpy".

void RTDEF(Assign_omp)(Descriptor &to, const Descriptor &from, const char *sourceFile, int sourceLine, omp::OMPDeviceTy omp_device)

This PR is pre-requisite for workdistribute lowering PR #140523

Copy link

github-actions bot commented Jun 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@skc7 skc7 requested a review from mjklemm June 24, 2025 06:37
@skc7 skc7 marked this pull request as ready for review July 4, 2025 09:08
@llvmbot llvmbot added the flang Flang issues not falling into any other category label Jul 4, 2025
@skc7 skc7 requested review from ivanradanov, ergawy and kparzysz July 22, 2025 12:56
@skc7 skc7 requested review from Meinersbur and jhuber6 August 4, 2025 08:19
allocatable.cpp
array-constructor.cpp
assign.cpp
assign_omp.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's a good idea to make the Fortran runtime depend on the OpenMP runtime library. I think it makes more sense to have this routine live in the OpenMP offload runtime as a potentially generic API entry point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortran runtime function implementations exist in the OpenMP runtime library like this: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp_ftn_entry.h . If it is offload-only, it may also live in libomptarget.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context:
Assign_omp just does omp_target_memcpy between two device pointers.
void RTDEF(Assign_omp)(Descriptor &to, const Descriptor &from, const char *sourceFile, int sourceLine, omp::OMPDeviceTy omp_device)

This api is required, when hoisting "fir.call @_FortranAAssign(...)" from omp.target to the host in "lower-workdistribute" pass #140523

Descriptor struct is defined in flang-rt/runtime/descriptor.h.

Issue:
Now if I need to move this implementation to openmp runtime or libomptarget, I wouldn't have access to fortran-rt Descriptor structure there. Is there any solution to deal with such issue?

Probable solution:
Instead of creating new runtime api, may be, make call to "omp_target_memcpy" directly by extracting the ptrs from Descriptor structure at MLIR stage? Need to check if this is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two options:

  • Use the CFI of Fortran to access the descriptor. It will sort of tie the OpenMP runtime to the Fortran runtime, but it will do so in an ISO-conforming way.
  • Unpack the descriptor in the code-gen for the assign operation. It might be prudent to provide an accessor function for getting the pointers and size information from the descriptor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions @mjklemm
Have tried second approach of adding Descriptor accessors in flang-rt. Draft PR #152756 is under progress.

Working on adding API in openmp runtime to do omp_target_memcpy between two device pointers.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding tests as well

allocatable.cpp
array-constructor.cpp
assign.cpp
assign_omp.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortran runtime function implementations exist in the OpenMP runtime library like this: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp_ftn_entry.h . If it is offload-only, it may also live in libomptarget.

// If not present on the device it should already be a device ptr
if (!omp_target_is_present(voidAnyPtr, ompDevice))
return anyPtr;
T *device_ptr = omp_get_mapped_ptr(anyPtr, ompDevice);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use the same style for variables names: devicePtr.


typedef int32_t OMPDeviceTy;

template <typename T> static T *getDevicePtr(T *anyPtr, OMPDeviceTy ompDevice) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for easier reasability:

Suggested change
template <typename T> static T *getDevicePtr(T *anyPtr, OMPDeviceTy ompDevice) {
template <typename T> static T *getDevicePtr(T *hostPtr, OMPDeviceTy ompDevice) {

std::size_t fromElements{from.Elements()};

if (toElementBytes != fromElementBytes)
terminator.Crash("Assign: toElementBytes != fromElementBytes");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging the number of element bytes (and elements below) in the crash might be helpful when debugging.

terminator.Crash("Assign: toElements != fromElements");

// Get base addresses and calculate length
void *to_base = to.raw().base_addr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Naming style is different in this function as well.

}

RT_API_ATTRS static void Assign(Descriptor &to, const Descriptor &from,
Terminator &terminator, int flags, OMPDeviceTy omp_device) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags is not used. Do we need it here?

std::size_t toElementBytes{to.ElementBytes()};
std::size_t fromElementBytes{from.ElementBytes()};
std::size_t toElements{to.Elements()};
std::size_t fromElements{from.Elements()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to check also that descriptors are contiguous. You can have the same number of elements but the stride might be different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want it to work also on non contiguous descriptors, the Assign function as a mechanism to pass memmove function to use.

Comment on lines +1 to +2
//===-- lib/runtime/assign_omp.cpp ----------------------------------*- C++
//-*-===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format

@clementval
Copy link
Contributor

clementval commented Aug 19, 2025

Will this add a hard OpenMP dependency for anyone building the flang runtime?

This should probably be placed in a separate library just for OpenMP like it is done for CUDA Fortran.

@skc7
Copy link
Contributor Author

skc7 commented Aug 20, 2025

Will this add a hard OpenMP dependency for anyone building the flang runtime?

This should probably be placed in a separate library just for OpenMP like it is done for CUDA Fortran.

@clementval As discussed in #152756 and based on your feedback its better to not have openmp rt dependency on flang-rt. Closing this PR.

@skc7 skc7 closed this Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants